Assessment of the p13k cellular signaling pathway activity using mathematical modelling of target gene expression

ABSTRACT

The present invention relates to a method comprising inferring activity of a PI3K cellular signaling pathway in a tissue and/or cells and/or a body fluid of a medical subject based at least on expression levels of one or more target gene(s) of the PI3K cellular signaling pathway measured in an extracted sample of the tissue and/or the cells and/or the body fluid of the medical subject. The present invention further relates to an apparatus comprising a digital processor configured to perform such a method, a non-transitory storage medium storing instructions that are executable by a digital processing device to perform such a method, and a computer program comprising program code means for causing a digital processing device to perform such a method.

CROSS-REFERENCE TO PRIOR APPLICATIONS

This application is a Continuation of U.S. application Ser. No.15/023,820, filed Mar. 22, 2016, which is the U.S. National Phaseapplication under 35 U.S.C. § 371 of International Application No.PCT/EP2014/079468, filed on Dec. 30, 2014, which claims the benefit ofEuropean Patent Application No. 14150145.2, filed on Jan. 3, 2014. Theseapplications are hereby incorporated by reference herein.

FIELD OF THE INVENTION

The present invention generally relates to the field of bioinformatics,genomic processing, proteomic processing, and related arts. Moreparticularly, the present invention relates to a method comprisinginferring activity of a PI3K cellular signaling pathway in a tissueand/or cells and/or a body fluid of a medical subject based at least onexpression levels of one or more target gene(s) of the PI3K cellularsignaling pathway measured in an extracted sample of the tissue and/orthe cells and/or the body fluid of the medical subject. The presentinvention further relates to an apparatus comprising a digital processorconfigured to perform such a method, a non-transitory storage mediumstoring instructions that are executable by a digital processing deviceto perform such a method, and a computer program comprising program codemeans for causing a digital processing device to perform such a method.

BACKGROUND OF THE INVENTION

Genomic and proteomic analyses have substantial realized and potentialpromise for clinical application in medical fields such as oncology,where various cancers are known to be associated with specificcombinations of genomic mutations/variations and/or high or lowexpression levels for specific genes, which play a role in growth andevolution of cancer, e.g., cell proliferation and metastasis.

For example, screening for an over-expression of the HER2 receptor onthe membrane of cells in breast cancer samples is currently the standardtest performed for identifying patients that are eligible to HER2inhibitors such as Trastuzumab. Over-expression of the ERBB2 gene, whichresults in an over-expression of the HER2 receptor on the cell membrane,occurs in approximately 25% to 30% of all breast cancers and isassociated with an increased disease recurrence and a poor prognosis.However, the expression of the HER2 receptor is by no means a conclusiveindictor for driving tumor growth as the signaling initiated by the HER2receptor can for instance be dampened by the downstream cellularsignaling pathway. This also seems to be reflected in the initialresponse rate of 26% in HER2-positive breast cancer patients treatedwith Trastuzumab (Charles L. Vogel, et al., “Efficacy and Safety ofTrastuzumab as a Single Agent in First-Line Treatment ofHER2-Overexpressing Metastatic Breast Cancer”, Journal of ClinicalOncology, Vol. 20, No. 3, February 2002, pages 719 to 726). Besidesthat, the cellular signaling pathway downstream of the HER2 receptor canalso be activated by mutations/over-expression in proteins downstream ofthe HER2 receptor, resulting in (a) relatively aggressive tumor type(s)that will not be detected by measuring HER2 expression levels. It istherefore desirable to be able to improve the possibilities ofcharacterizing patients that have a tumor, e.g., breast cancer, which isat least partially driven by effects occurring in the cellular signalingpathway downstream of the HER2 receptor.

SUMMARY OF THE INVENTION

The present invention provides new and improved methods and apparatusesas disclosed herein.

In accordance with a main aspect of the present invention, the aboveproblem is solved by a method for inferring activity of a PI3K cellularsignaling pathway using mathematical modelling of target geneexpressions, namely a method comprising:

inferring activity of a PI3K cellular signaling pathway in a tissueand/or cells and/or a body fluid of a medical subject based at least onexpression levels of one or more target gene(s) of the PI3K cellularsignaling pathway measured in an extracted sample of the tissue and/orthe cells and/or the body fluid of the medical subject, wherein theinferring comprises:

determining a level of a FOXO transcription factor (TF) element in theextracted sample of the tissue and/or the cells and/or the body fluid ofthe medical subject, the FOXO TF element controlling transcription ofthe one or more target gene(s) of the PI3K cellular signaling pathway,the determining being based at least in part on evaluating amathematical model relating expression levels of the one or more targetgene(s) of the PI3K cellular signaling pathway to the level of the FOXOTF element;

inferring the activity of the PI3K cellular signaling pathway in thetissue and/or the cells and/or the body fluid of the medical subjectbased on the determined level of the FOXO TF element in the extractedsample of the tissue and/or the cells and/or the body fluid of themedical subject, wherein the inferring is performed by a digitalprocessing device using the mathematical model.

The present invention is based on the realization of the inventors thata suitable way of identifying effects occurring in the cellularsignaling pathway downstream of the HER2 receptor, herein, the PI3Kcellular signaling pathway, can be based on a measurement of thesignaling output of the cellular signaling pathway, which is—amongstothers—the transcription of the target genes by a transcription factor(TF), herein, the FOXO TF element, controlled by the cellular signalingpathway. The PI3K cellular signaling pathway targeted herein is not onlylinked to breast cancer, but is known to be inappropriately activated inmany types of cancer (Jeffrey A. Engelman, “Targeting PI3K signalling incancer: opportunities, challenges and limitations”, Nature ReviewsCancer, No. 9, August 2009, pages 550 to 562). It is thought to beregulated by the RTK receptor family, which also includes theHER-family. Subsequently, the PI3K cellular signaling pathway passes onits received signal(s) via a multitude of processes, of which the twomain branches are the activation of the mTOR complexes and theinactivation of a family of transcription factors often referred to asFOXO (cf. the figure showing the PI3K cellular signaling pathway in theabove article from Jeffrey A. Engelman). The present inventionconcentrates on the PI3K cellular signaling pathway and the FOXO TFfamily, the activity of which is substantially negatively correlatedwith the activity of the PI3K cellular signaling pathway, i.e., activityof FOXO is substantially correlated with inactivity of the PI3K cellularsignaling pathway, whereas inactivity of FOXO is substantiallycorrelated with activity of the PI3K cellular signaling pathway. Thepresent invention makes it possible to determine the activity of thePI3K cellular signaling pathway in a tissue and/or cells and/or a bodyfluid of a medical subject by (i) determining a level of a FOXO TFelement in the extracted sample of the tissue and/or the cells and/orthe body fluid of the medical subject, wherein the determining is basedat least in part on evaluating a mathematical model relating expressionlevels of one or more target gene(s) of the PI3K cellular signalingpathway, the transcription of which is controlled by the FOXO TFelement, to the level of the FOXO TF element, and by (ii) inferring theactivity of the PI3K cellular signaling pathway in the tissue and/or thecells and/or the body fluid of the medical subject based on thedetermined level of the FOXO TF element in the extracted sample of thetissue and/or the cells and/or the body fluid of the medical subject.This preferably allows improving the possibilities of characterizingpatients that have a tumor, e.g., breast cancer, which is at leastpartially driven by a deregulated PI3K cellular signaling pathway, andthat are therefore likely to respond to inhibitors of the PI3K cellularsignaling pathway.

Herein, a FOXO transcription factor (TF) element is defined to be aprotein complex containing at least one of the FOXO TF family members,i.e., FOXO1, FOXO3A, FOXO4 and FOXO6, which is capable of binding tospecific DNA sequences, thereby controlling transcription of targetgenes.

The mathematical model may be a probabilistic model, preferably aBayesian network model, based at least in part on conditionalprobabilities relating the FOXO TF element and expression levels of theone or more target gene(s) of the PI3K cellular signaling pathwaymeasured in the extracted sample of the tissue and/or the cells and/orthe body fluid of the medical subject, or the mathematical model may bebased at least in part on one or more linear combination(s) ofexpression levels of the one or more target gene(s) of the PI3K cellularsignaling pathway measured in the extracted sample of the tissue and/orthe cells and/or the body fluid of the medical subject. In particular,the inferring of the activity of the PI3K cellular signaling pathway maybe performed as disclosed in the published international patentapplication WO 2013/011479 A2 (“Assessment of cellular signaling pathwayactivity using probabilistic modeling of target gene expression”) or asdescribed in the published international patent application WO2014/102668 A2 (“Assessment of cellular signaling pathway activity usinglinear combination(s) of target gene expressions”), the contents ofwhich are herewith incorporated in their entirety.

The medical subject may be a human or an animal. Moreover, the tissueand/or the cells and/or the body fluid of the medical subject may befrom a cell line and/or a tissue culture derived from a medical subjectand, if applicable, cultivated in vitro in the lab (e.g., forregenerative purposes). Furthermore, the “target gene(s)” may be “directtarget genes” and/or “indirect target genes” (as described herein).

Particularly suitable target genes are described in the following textpassages as well as the examples below (see, e.g., Tables 1 to 3).

Thus, according to a preferred embodiment the target gene(s) is/areselected from the group consisting of the target genes listed in Table3.

Particularly preferred is a method wherein the inferring comprises:

inferring the activity of the PI3K cellular signaling pathway in thetissue and/or the cells and/or the body fluid of the medical subjectbased at least on expression levels of one or more, preferably at leastthree, target gene(s) of the PI3K cellular signaling pathway measured inthe extracted sample of the tissue and/or the cells and/or the bodyfluid of the medical subject selected from the group consisting of:AGRP, BCL2L11, BCL6, BNIP3, BTG1, CAT, CAV1, CCND1, CCND2, CCNG2,CDKN1A, CDKN1B, ESR1, FASLG, FBXO32, GADD45A, INSR, MXI1, NOS3, PCK1,POMC, PPARGC1A, PRDX3, RBL2, SOD2 and TNFSF10.

Further preferred is a method, wherein the inferring is further based onexpression levels of at least one target gene of the PI3K cellularsignaling pathway measured in the extracted sample of the tissue and/orthe cells and/or the body fluid of the medical subject selected from thegroup consisting of: ATP8A1, C10orf10, CBLB, DDB1, DYRK2, ERBB3, EREG,EXT1, FGFR2, IGF1R, IGFBP1, IGFBP3, LGMN, PPM1D, SEMA3C, SEPP1, SESN1,SLC5A3, SMAD4 and TLE4.

Further preferred is a method, wherein the inferring is further based onexpression levels of at least one target gene of the PI3K cellularsignaling pathway measured in the extracted sample of the tissue and/orthe cells and/or the body fluid of the medical subject selected from thegroup consisting of: ATG14, BIRC5, IGFBP1, KLF2, KLF4, MYOD1, PDK4,RAG1, RAG2, SESN1, SIRT1, STK11 and TXNIP.

If the inferring is further based both on expression levels of at leastone target gene selected from the group specified in the precedingparagraph and on expression levels of at least one target gene selectedfrom the group specified in the paragraph preceding the precedingparagraph, the target genes IGFBP1 and SESN1, which are mentioned abovewith respect to both groups, may only be contained in one of the groups.

Another aspect of the present invention relates to a method (asdescribed herein), further comprising:

determining whether the PI3K cellular signaling pathway is operatingabnormally in the tissue and/or the cells and/or the body fluid of themedical subject based on the inferred activity of the PI3K cellularsignaling pathway in the tissue and/or the cells and/or the body fluidof the medical subject.

The present invention also relates to a method (as described herein)further comprising:

recommending prescribing a drug for the medical subject that correctsfor abnormal operation of the PI3K cellular signaling pathway, whereinthe recommending is performed only if the PI3K cellular signalingpathway is determined to be operating abnormally in the tissue and/orthe cells and/or the body fluid of the medical subject based on theinferred activity of the PI3K cellular signaling pathway.

The present invention also relates to a method (as described herein),wherein the inferring comprises:

inferring the activity of the PI3K cellular signaling pathway in thetissue and/or the cells and/or the body fluid of the medical subjectbased at least on expression levels of two, three or more target genesof a set of target genes of the PI3K cellular signaling pathway measuredin the extracted sample of the tissue and/or the cells and/or the bodyfluid of the medical subject.

Preferably,

the set of target genes of the PI3K cellular signaling pathway includesat least nine, preferably all target genes selected from the groupconsisting of: AGRP, BCL2L11, BCL6, BNIP3, BTG1, CAT, CAV1, CCND1,CCND2, CCNG2, CDKN1A, CDKN1B, ESR1, FASLG, FBXO32, GADD45A, INSR, MXI1,NOS3, PCK1, POMC, PPARGC1A, PRDX3, RBL2, SOD2 and TNFSF10.

A method, wherein

the set of target genes of the PI3K cellular signaling pathway furtherincludes at least one target gene selected from the group consisting of:ATP8A1, C10orf10, CBLB, DDB1, DYRK2, ERBB3, EREG, EXT1, FGFR2, IGF1R,IGFBP1, IGFBP3, LGMN, PPM1D, SEMA3C, SEPP1, SESN1, SLC5A3, SMAD4 andTLE4,

is particularly preferred.

A method, wherein

the set of target genes of the PI3K cellular signaling pathway furtherincludes at least one target gene selected from the group consisting of:ATG14, BIRC5, IGFBP1, KLF2, KLF4, MYOD1, PDK4, RAG1, RAG2, SESN1, SIRT1,STK11 and TXNIP,

is also particularly preferred.

If the set of target genes further includes both at least one targetgene selected from the group specified in the preceding paragraph and atleast one target gene selected from the group specified in the paragraphpreceding the preceding paragraph, the target genes IGFBP1 and SESN1,which are mentioned above with respect to both groups, may only becontained in one of the groups.

The sample(s) to be used in accordance with the present invention canbe, e.g., a sample obtained from a cancer lesion, or from a lesionsuspected for cancer, or from a metastatic tumor, or from a body cavityin which fluid is present which is contaminated with cancer cells (e.g.,pleural or abdominal cavity or bladder cavity), or from other bodyfluids containing cancer cells, and so forth, preferably via a biopsyprocedure or other sample extraction procedure. The cells of which asample is extracted may also be tumorous cells from hematologicmalignancies (such as leukemia or lymphoma). In some cases, the cellsample may also be circulating tumor cells, that is, tumor cells thathave entered the bloodstream and may be extracted using suitableisolation techniques, e.g., apheresis or conventional venous bloodwithdrawal. Aside from blood, the body fluid of which a sample isextracted may be urine, gastrointestinal contents, or an extravasate.The term “extracted sample”, as used herein, also encompasses the casewhere tissue and/or cells and/or body fluid of the subject have beentaken from the subject and, e.g., have been put on a microscope slide,and where for performing the claimed method a portion of this sample isextracted, e.g., by means of Laser Capture Microdissection (LCM), or byscraping off the cells of interest from the slide, or byfluorescence-activated cell sorting techniques.

In accordance with another disclosed aspect, an apparatus comprises adigital processor configured to perform a method according to thepresent invention as described herein.

In accordance with another disclosed aspect, a non-transitory storagemedium stores instructions that are executable by a digital processingdevice to perform a method according to the present invention asdescribed herein. The non-transitory storage medium may be acomputer-readable storage medium, such as a hard drive or other magneticstorage medium, an optical disk or other optical storage medium, arandom access memory (RAM), read only memory (ROM), flash memory, orother electronic storage medium, a network server, or so forth. Thedigital processing device may be a handheld device (e.g., a personaldata assistant or smartphone), a notebook computer, a desktop computer,a tablet computer or device, a remote network server, or so forth.

In accordance with another disclosed aspect, a computer programcomprises program code means for causing a digital processing device toperform a method according to the present invention as described herein.The digital processing device may be a handheld device (e.g., a personaldata assistant or smartphone), a notebook computer, a desktop computer,a tablet computer or device, a remote network server, or so forth.

The present invention as described herein can, e.g., also advantageouslybe used in connection with:

diagnosis based on the inferred activity of the PI3K cellular signalingpathway in the tissue and/or the cells and/or the body fluid of themedical subject;

prognosis based on the inferred activity of the PI3K cellular signalingpathway in the tissue and/or the cells and/or the body fluid of themedical subject;

drug prescription based on the inferred activity of the PI3K cellularsignaling pathway in the tissue and/or the cells and/or the body fluidof the medical subject;

prediction of drug efficacy based on the inferred activity of the PI3Kcellular signaling pathway in the tissue and/or the cells and/or thebody fluid of the medical subject;

prediction of adverse effects based on the inferred activity of the PI3Kcellular signaling pathway in the tissue and/or the cells and/or thebody fluid of the medical subject;

monitoring of drug efficacy;

drug development;

assay development;

pathway research;

cancer staging;

enrollment of the medical subject in a clinical trial based on theinferred activity of the PI3K cellular signaling pathway in the tissueand/or the cells and/or the body fluid of the medical subject;

selection of subsequent test to be performed; and

selection of companion diagnostics tests.

Further advantages will be apparent to those of ordinary skill in theart upon reading and understanding the attached figures, the followingdescription and, in particular, upon reading the detailed examplesprovided herein below.

It shall be understood that the method of claim 1, the apparatus ofclaim 13, the non-transitory storage medium of claim 15, and thecomputer program of claim 15 have similar and/or identical preferredembodiments, in particular, as defined in the dependent claims.

It shall be understood that a preferred embodiment of the presentinvention can also be any combination of the dependent claims or aboveembodiments with the respective independent claim.

These and other aspects of the invention will be apparent from andelucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows schematically and exemplarily a mathematical model, herein,a Bayesian network model, used to model the transcriptional program ofthe PI3K cellular signaling pathway.

FIG. 2 shows training results of the exemplary Bayesian network modelbased on (A.) the evidence curated list of target genes of the PI3Kcellular signaling pathway (cf. Table 1), (B.) the database-based listof target genes of the PI3K cellular signaling pathway (cf. Table 2),and (C.) the shortlist of target genes of the PI3K cellular signalingpathway (cf. Table 3).

FIG. 3 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for breast (cancer) samples of GSE17907.

FIG. 4 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for a number of healthy colon samples (group 1) andadenomatous polyps (group 2) published as the GSE8671 dataset.

FIG. 5 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for colon (cancer) samples of GSE20916.

FIG. 6 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for prostate (cancer) cells published in the GSE17951dataset.

FIG. 7 illustrates a prognosis of ER+ breast cancer patients (GSE6532 &GSE9195) depicted in a Kaplan-Meier plot.

FIG. 8 shows training results of the exemplary linear model based on theshortlist of target genes of the PI3K cellular signaling pathway (cf.Table 3).

FIG. 9 shows test results of the exemplary linear model based on theshortlist of target genes of the PI3K cellular signaling pathway (cf.Table 3) for breast (cancer) samples of GSE17907.

FIG. 10 shows test results of the exemplary linear model based on theshortlist of target genes of the PI3K cellular signaling pathway (cf.Table 3) for prostate (cancer) samples of GSE17951.

DETAILED DESCRIPTION OF EMBODIMENTS

The following examples merely illustrate particularly preferred methodsand selected aspects in connection therewith. The teaching providedtherein may be used for constructing several tests and/or kits, e.g., todetect, predict and/or diagnose the abnormal activity of one or morecellular signaling pathways. Furthermore, upon using methods asdescribed herein drug prescription can advantageously be guided, drugprediction and monitoring of drug efficacy (and/or adverse effects) canbe made, drug resistance can be predicted and monitored, e.g., to selectsubsequent test(s) to be performed (like a companion diagnostic test).The following examples are not to be construed as limiting the scope ofthe present invention.

Example 1: Mathematical Model Construction

As described in detail in the published international patent applicationWO 2013/011479 A2 (“Assessment of cellular signaling pathway activityusing probabilistic modeling of target gene expression”), byconstructing a probabilistic model, e.g., a Bayesian network model, andincorporating conditional probabilistic relationships between expressionlevels of one or more target gene(s) of a cellular signaling pathway,herein, the PI3K cellular signaling pathway, and the level of atranscription factor (TF) element, herein, the FOXO TF element, the TFelement controlling transcription of the one or more target gene(s) ofthe cellular signaling pathway, such a model may be used to determinethe activity of the cellular signaling pathway with a high degree ofaccuracy. Moreover, the probabilistic model can be readily updated toincorporate additional knowledge obtained by later clinical studies, byadjusting the conditional probabilities and/or adding new nodes to themodel to represent additional information sources. In this way, theprobabilistic model can be updated as appropriate to embody the mostrecent medical knowledge.

In another easy to comprehend and interpret approach described in detailin the published international patent application WO 2014/102668 A2(“Assessment of cellular signaling pathway activity using linearcombination(s) of target gene expressions”), the activity of a cellularsignaling pathway, herein, the PI3K cellular signaling pathway, may bedetermined by constructing and evaluating a linear or (pseudo-)linearmodel incorporating relationships between expression levels of one ormore target gene(s) of the cellular signaling pathway and the level of atranscription factor (TF) element, herein, the FOXO TF element, the TFelement controlling transcription of the one or more target gene(s) ofthe cellular signaling pathway, the model being based at least in parton one or more linear combination(s) of expression levels of the one ormore target gene(s).

In both approaches, the expression levels of the one or more targetgene(s) may preferably be measurements of the level of mRNA, which canbe the result of, e.g., (RT)-PCR and microarray techniques using probesassociated with the target gene(s) mRNA sequences, and ofRNA-sequencing. In another embodiment the expression levels of the oneor more target gene(s) can be measured by protein levels, e.g., theconcentrations of the proteins encoded by the target genes.

The aforementioned expression levels may optionally be converted in manyways that might or might not suit the application better. For example,four different transformations of the expression levels, e.g.,microarray-based mRNA levels, may be:

“continuous data”, i.e., expression levels as obtained afterpreprocessing of microarrays using well known algorithms such as MAS5.0and fRMA,

“z-score”, i.e., continuous expression levels scaled such that theaverage across all samples is 0 and the standard deviation is 1,

“discrete”, i.e., every expression above a certain threshold is set to 1and below it to 0 (e.g., the threshold for a probeset may be chosen asthe median of its value in a set of a number of positive and the samenumber of negative clinical samples),

“fuzzy”, i.e., the continuous expression levels are converted to valuesbetween 0 and 1 using a sigmoid function of the following format:1/(1+exp((thr−expr)/se)), with expr being the continuous expressionlevels, thr being the threshold as mentioned before and se being asoftening parameter influencing the difference between 0 and 1.

One of the simplest linear models that can be constructed is a modelhaving a node representing the transcription factor (TF) element,herein, the FOXO TF element, in a first layer and weighted nodesrepresenting direct measurements of the target gene(s) expressionintensity levels, e.g., by one probeset that is particularly highlycorrelated with the particular target gene, e.g., in microarray or(q)PCR experiments, in a second layer. The weights can be based eitheron calculations from a training data set or based on expert knowledge.This approach of using, in the case where possibly multiple expressionlevels are measured per target gene (e.g., in the case of microarrayexperiments, where one target gene can be measured with multipleprobesets), only one expression level per target gene is particularlysimple. A specific way of selecting the one expression level that isused for a particular target gene is to use the expression level fromthe probeset that is able to separate active and passive samples of atraining data set the best. One method to determine this probeset is toperform a statistical test, e.g., the t-test, and select the probesetwith the lowest p-value. The training data set's expression levels ofthe probe with the lowest p-value is by definition the probe with theleast likely probability that the expression levels of the (known)active and passive samples overlap. Another selection method is based onodds-ratios. In such a model, one or more expression level(s) areprovided for each of the one or more target gene(s) and the one or morelinear combination(s) comprise a linear combination including for eachof the one or more target gene(s) a weighted term, each weighted termbeing based on only one expression level of the one or more expressionlevel(s) provided for the respective target gene. If the only oneexpression level is chosen per target gene as described above, the modelmay be called a “most discriminant probesets” model.

In an alternative to the “most discriminant probesets” model, it ispossible, in the case where possibly multiple expression levels aremeasured per target gene, to make use of all the expression levels thatare provided per target gene. In such a model, one or more expressionlevel(s) are provided for each of the one or more target gene(s) and theone or more linear combination(s) comprise a linear combination of allexpression levels of the one or more expression level(s) provided forthe one or more target gene(s). In other words, for each of the one ormore target gene(s), each of the one or more expression level(s)provided for the respective target gene may be weighted in the linearcombination by its own (individual) weight. This variant may be calledan “all probesets” model. It has an advantage of being relatively simplewhile making use of all the provided expression levels.

Both models as described above have in common that they are what may beregarded as “single-layer” models, in which the level of the TF elementis calculated based on a linear combination of expression levels.

After the level of the TF element, herein, the FOXO TF element, has beendetermined by evaluating the respective model, the determined TF elementlevel can be thresholded in order to infer the activity of the cellularsignaling pathway, herein, the PI3K cellular signaling pathway. A methodto calculate such an appropriate threshold is by comparing thedetermined TF element level wlc of training samples known to have apassive pathway and training samples with an active pathway. A methodthat does so and also takes into account the variance in these groups isgiven by using a threshold

$\begin{matrix}{{thr} = \frac{{\sigma_{wlc_{pas}}\mu_{wlc_{act}}} + {\sigma_{wlc_{act}}\mu_{wlc_{pas}}}}{\sigma_{wlc_{pas}} + \sigma_{wlc_{act}}}} & (1)\end{matrix}$

where σ and μ are the standard deviation and the mean of the trainingsamples. In case only a small number of samples are available in theactive and/or passive training samples, a pseudocount may be added tothe calculated variances based on the average of the variances of thetwo groups:

$\begin{matrix}{{\overset{\sim}{v} = \frac{v_{wlc_{act}} + v_{wlc_{pas}}}{2}}{{\overset{\sim}{v}}_{wlc_{\alpha ct}} = \frac{{x\overset{\sim}{v}} + {\left( {n_{act} - 1} \right)v_{wlc_{act}}}}{x + n_{act} - 1}}{{\overset{\sim}{v}}_{wlc_{pas}} = \frac{{xv} + {\left( {n_{pas} - 1} \right)v_{wlc_{pas}}}}{x + n_{pas} - 1}}} & (2)\end{matrix}$

where ν is the variance of the groups and x a positive pseudocount. Thestandard deviation σ can next be obtained by taking the square root ofthe variance ν.

The threshold can be subtracted from the determined level of the TFelement wlc for ease of interpretation, resulting in the cellularsignaling pathway's activity score, such that negative valuescorresponds to a passive cellular signaling pathway and positive valuesto an active cellular signaling pathway.

As an alternative to the above-described “single-layer” models, a“two-layer” model may also be used in an example. In such a model, asummary value is calculated for every target gene using a linearcombination based on the measured intensities of its associatedprobesets (“first (bottom) layer”). The calculated summary value issubsequently combined with the summary values of the other target genesof the cellular signaling pathway using a further linear combination(“second (upper) layer”). Again, the weights can be either learned froma training data set or based on expert knowledge or a combinationthereof. Phrased differently, in the “two-layer” model, one or moreexpression level(s) are provided for each of the one or more targetgene(s) and the one or more linear combination(s) comprise for each ofthe one or more target gene(s) a first linear combination of allexpression levels of the one or more expression level(s) provided forthe respective target gene (“first (bottom) layer”). The model isfurther based at least in part on a further linear combination includingfor each of the one or more target gene(s) a weighted term, eachweighted term being based on the first linear combination for therespective target gene (“second (upper) layer”

The calculation of the summary values can, in a preferred version of the“two-layer” model, include defining a threshold for each target geneusing the training data and subtracting the threshold from thecalculated linear combination, yielding the target gene summary. Herethe threshold may be chosen such that a negative target gene summaryvalue corresponds to a down-regulated target gene and that a positivetarget gene summary value corresponds to an up-regulated target gene.Also, it is possible that the target gene summary values are transformedusing, e.g., one of the above-described transformations (fuzzy,discrete, etc.), before they are combined in the “second (upper) layer”.Next the determined target genes summary values are summed to get the TFsummary level.

After the level of the TF element has been determined by evaluating the“two-layer” model, the determined TF element level can be thresholded inorder to infer the activity of the cellular signaling pathway, asdescribed above.

In the following, the models described above are collectively denoted as“(pseudo-) linear” models. A more detailed description of the trainingand use of probabilistic models, e.g., a Bayesian network model, and of(pseudo-)linear models is provided in Example 3 below.

Example 2: Selection of Target Genes

A transcription factor (TF) is a protein complex (i.e., a combination ofproteins bound together in a specific structure) or a protein that isable to regulate transcription from target genes by binding to specificDNA sequences, thereby controlling the transcription of geneticinformation from DNA to mRNA. The mRNA directly produced due to thisaction of the transcription complex is herein referred to as a “directtarget gene” (of the transcription factor). Cellular signaling pathwayactivation may also result in more secondary gene transcription,referred to as “indirect target genes”. In the following, Bayesiannetwork models (as exemplary mathematical models) comprising orconsisting of direct target genes as direct links between cellularsignaling pathway activity and mRNA level, are preferred, however thedistinction between direct and indirect target genes is not alwaysevident. Herein, a method to select direct target genes using a scoringfunction based on available scientific literature data is presented.Nonetheless, an accidental selection of indirect target genes cannot beruled out due to limited information as well as biological variationsand uncertainties. In order to select the target genes, two repositoriesof currently available scientific literature were employed to generatetwo lists of target genes.

The first list of target genes was generated based on scientificliterature retrieved from the MEDLINE database of the National Instituteof Health accessible at “www.ncbi.nlm.nih.gov/pubmed” and herein furtherreferred to as “Pubmed”. Publications containing putative FOXO targetgenes were searched for by using queries such as (FOXO AND “targetgene”) in the period of the first quarter of 2013. The resultingpublications were further analyzed manually following the methodologydescribed in more detail below. Specific cellular signaling pathway mRNAtarget genes were selected from the scientific literature, by using aranking system in which scientific evidence for a specific target genewas given a rating, depending on the type of scientific experiments inwhich the evidence was accumulated. While some experimental evidence ismerely suggestive of a gene being a target gene, like for example anmRNA increasing on an microarray of an cell line in which it is knownthat the PI3K cellular signaling axis is active, other evidence can bevery strong, like the combination of an identified cellular signalingpathway TF binding site and retrieval of this site in a chromatinimmunoprecipitation (ChIP) assay after stimulation of the specificcellular signaling pathway in the cell and increase in mRNA afterspecific stimulation of the cellular signaling pathway in a cell line.

Several types of experiments to find specific cellular signaling pathwaytarget genes can be identified in the scientific literature:

1. ChIP experiments in which direct binding of a cellular signalingpathway-TF to its binding site on the genome is shown. Example: By usingchromatin immunoprecipitation (ChIP) technology subsequently putativefunctional FOXO TF binding sites in the DNA of cell lines with andwithout active induction of the PI3K cellular signaling pathway wereidentified, as a subset of the binding sites recognized purely based onnucleotide sequence. Putative functionality was identified asChIP-derived evidence that the TF was found to bind to the DNA bindingsite.2. Electrophoretic Mobility Shift (EMSA) assays which show in vitrobinding of a TF to a fragment of DNA containing the binding sequence.Compared to ChIP-based evidence EMSA-based evidence is less strong,since it cannot be translated to the in vivo situation.3. Stimulation of the cellular signaling pathway and measuring mRNAprofiles on a microarray or using RNA sequencing, using cellularsignaling pathway-inducible cell lines and measuring mRNA profilesmeasured several time points after induction—in the presence ofcycloheximide, which inhibits translation to protein, thus the inducedmRNAs are assumed to be direct target genes.4. Similar to 3, but using quantitative PCR to measure the amounts ofmRNAs.5. Identification of TF binding sites in the genome using abioinformatics approach. Example for the FOXO TF element: Using theconserved FOXO binding motif 5′-TTGTTTAC-3′, a software program was runon the human genome sequence, and potential binding sites wereidentified, both in gene promoter regions and in other genomic regions.6. Similar as 3, only in the absence of cycloheximide.7. Similar to 4, only in the absence of cycloheximide.8. mRNA expression profiling of specific tissue or cell samples of whichit is known that the cellular signaling pathway is active, however inabsence of the proper negative control condition.

In the simplest form one can give every potential target mRNA 1 pointfor each of these experimental approaches in which the target mRNA wasidentified.

Alternatively, points can be given incrementally, meaning one technology1 point, a second technology adds a second point, and so on. Using thisrelatively ranking strategy, one can make a list of most reliable targetgenes.

Alternatively, ranking in another way can be used to identify the targetgenes that are most likely to be direct target genes, by giving a highernumber of points to the technology that provides most evidence for an invivo direct target gene, in the list above this would mean 8 points forexperimental approach 1), 7 for 2), and going down to 1 point forexperimental approach 8). Such a list may be called a “general targetgene list”.

Despite the biological variations and uncertainties, the inventorsassumed that the direct target genes are the most likely to be inducedin a tissue-independent manner. A list of these target genes may becalled an “evidence curated list of target genes”. Such an evidencecurated list of target genes has been used to construct computationalmodels of the PI3K cellular signaling pathway that can be applied tosamples coming from different tissue sources.

The following will illustrate exemplary how the selection of an evidencecurated target gene list specifically was constructed for the PI3Kcellular signaling pathway.

For the purpose of selecting PI3K target genes used as input for the“model”, the following three criteria were used:

1. Gene promoter/enhancer region contains a FOXO binding motif:a. The FOXO binding motif should be proven to respond to an activity ofthe PI3K cellular signaling pathway, e.g., by means of a transienttransfection assay in which the specific FOXO motif is linked to areporter gene, andb. The presence of the FOXO motif should be confirmed by, e.g., anenriched motif analysis of the gene promoter/enhancer region.2. FOXO (differentially) binds in vivo to the promoter/enhancer regionof the gene in question, demonstrated by, e.g., a ChIP/CHIP experimentor another chromatin immunoprecipitation technique:a. FOXO is proven to bind to the promoter/enhancer region of the genewhen the PI3K cellular signaling pathway is not active, andb. (preferably) does not bind (or weakly binds) to the genepromoter/enhancer region of the gene when the PI3K cellular signalingpathway is active.3. The gene is differentially transcribed when the activity of the PI3Kcellular signaling pathway is changed, demonstrated by, e.g.,a. fold enrichment of the mRNA of the gene in question through real timePCR, or microarray experiment, orb. the demonstration that RNA Pol II binds to the promoter region of thegene through an immunoprecipitation assay.

The selection was performed by defining as target genes of the PI3Kcellular signaling pathway the genes for which enough and welldocumented experimental evidence was gathered proving that all threecriteria mentioned above were met. A suitable experiment for collectingevidence of PI3K differential binding is to compare the results of,e.g., a ChIP/CHIP experiment in a cancer cell line that expressesactivity of the PI3K cellular signaling pathway in response to tamoxifen(e.g., a cell line transfected with a tamoxifen-inducible FOXOconstruct, such as FOXO.A3.ER), when exposed or not exposed totamoxifen. The same holds for collecting evidence of mRNA transcription.

The foregoing discusses the generic approach and a more specific exampleof the target gene selection procedure that has been employed to selecta number of target genes based upon the evidence found using the abovementioned approach. The lists of target genes used in the Bayesiannetwork models for the PI3K cellular signaling pathway is shown in Table1.

TABLE 1 Evidence curated list of target genes of the PI3K cellularsignaling pathway used in the Bayesian network models and associatedprobesets used to measure the mRNA expression level of the target genes.Target gene Probeset ATP8A1 1569773_at 210192_at 213106_at BCL2L111553088_a_at 1553096_s_at 1555372_at 1558143_a_at 208536_s_at 222343_at225606_at BNIP3 201848_s_at 201849_at BTG1 1559975_at 200920_s_at20092_s_at C10orf10 209182_s_at 209183_s_at CAT 201432_at 211922_s_at215573_at CBLB 208348_s_at 209682_at CCND1 208711_s_at 208712_at214019_at CCND2 200951_s_at 200952_s_at 200953_s_at 231259_s_at1555056_at 202769_at 202770_s_at 211559_s_at CDKN1B 209112_at DDB1208619_at DYRK2 202968_s_at 202969_at 202970_at 202971_s_at ERBB31563252_at 1563253_s_at 202454_s_at 215638_at 226213_at EREG 1569583_at205767_at ESR1 205225_at 211233_x_at 211234_x_at 211235_s_at 211627_x_at215551_at 215552_s_at 217190_x_at 207672_at EXT1 201995_at FASLG210865_at 211333_s_at FGFR2 203638_s_at 203639_s_at 208225_at208228_s_at 208229_at 208234_x_at 211398_at 211399_at 211400_at211401_s_at 240913_at GADD45A 203725_at IGF1R 203627_at 203628_at208441_at 225330_at 243358_at IGFBP1 205302_at IGFBP3 210095_s_at212143_s_at INSR 207851_s_at 213792_s_at 226212_s_at 226216_at 226450_atLGMN 201212_at MXI1 202364_at PPM1D 204566_at 230330_at SEMA3C203788_s_at 203789_s_at SEPP1 201427_s_at 231669_at SESN1 218346_s_atSLC5A3 1553313_s_at 212944_at 213167_s_at 213164_at SMAD4 1565702_at1565703_at 202526_at 202527_s_at 235725_at SOD2 215078_at 215223_s_at216841_s_at 221477_s_at TLE4 204872_at 214688_at 216997_x_at 233575_s_at235765_at TNFSF10 202687_s_at 202688_at 214329_x_at

The second list of target genes was generated using the manually-curateddatabase of scientific publications provided within Thomson-Reuters'Metacore (last accessed: 14^(th) May, 2013). The database was queriedfor genes that are transcriptionally regulated directly downstream ofthe family of human FOXO transcription factors (i.e., FOXO1, FOXO3A,FOXO4 and FOXO6). This query resulted in 336 putative FOXO target genesthat were further analyzed as follows. First all putative FOXO targetgenes that only had one supporting publication were pruned. Next ascoring function was introduced that gave a point for each type ofexperimental evidence, such as ChIP, EMSA, differential expression,knock down/out, luciferase gene reporter assay, sequence analysis, thatwas reported in a publication. The same experimental evidence issometimes mentioned in multiple publications resulting in acorresponding number of points, e.g., two publications mentioning a ChIPfinding results in twice the score that is given for a single ChIPfinding. Further analysis was performed to allow only for genes that haddiverse types of experimental evidence and not only one type ofexperimental evidence, e.g., differential expression. Finally, anevidence score was calculated for all putative FOXO target genes and allputative FOXO target genes with an evidence score of 6 or more wereselected (shown in Table 2). The cut-off level of 6 was chosenheuristically as it was previously shown that approximately 30 targetgenes suffice largely to determine pathway activity.

A list of these target genes may be called a “database-based list oftarget genes”. Such a curated target gene list has been used toconstruct computational models that can be applied to samples comingfrom different tissue sources.

TABLE 2 Database-based list of target genes of the PI3K cellularsignaling pathway used in the Bayesian network models and associatedprobesets used to measure the mRNA expression level of the target genes.Target gene Probeset AGRP 207193_at ATG14 204568_at BCL2L11 1553088_a_at1553096_s_at 1555372_at 1558143_a_at 208536_s_at 222343_at 225606_atBCL6 203140_at 215990_s_at BIRC5 202094_at 202095_s_at 210334_x_at BNIP3201848_s_at 201849_at CAT 201432_at 211922_s_at 215573_at CAV1203065_s_at 212097_at CCNG2 1555056_at 202769_at 202770_s_at 211559_s_at228081_at CDKN1A 1555186_at 202284_s_at CDKN1B 209112_at FASLG 210865_at211333_s_at FBXO32 225801_at 225803_at 225345_s_at 225328_at GADD45A203725_at IGFBP1 205302_at KLF2 219371_s_at 226646_at KLF4 220266_s_at221841_s_at MYOD1 206656_s_at 206657_s_at NOS3 205581_s_at PCK1208383_s_at PDK4 1562321_at 205960_at 225207_at POMC 205720_at PPARGC1A1569141_a_at 219195_at PRDX3 201619_at 209766_at RAG1 1554994_at206591_at RAG2 215117_at RBL2 212331_at 212332_at SESN1 218346_s_atSIRT1 218878_s_at SOD2 215078_at 215223_s_at 216841_s_at 221477_s_atSTK11 204292_x_at 231017_at 41657_at TNFSF10 202687_s_at 202688_at214329_x_at TXNIP 201008_s_at 201009_s_at 201010_s_at

The third list of target genes was generated on the basis of the twoaforementioned lists, i.e., the evidence curated list (cf. Table 1) andthe database-based list (cf. Table 2). Three criteria have been used tofurther select genes from these two lists. The first criterion isrelated to the function attributed to the target genes. Functionsattributed to genes can be found in scientific literature, but are oftenavailable in public databases such as the OMIM database of the NIH(accessible via “http://www.ncbi.nlm.nih.gov/omim”). Target genes fromthe evidence curated list in Table 1 and the database-based list inTable 2 that were found to be attributed to be involved in processesessential to cancer, such as apoptosis, cell cycle, tumorsuppression/progression, DNA repair, differentiation, were selected inthe third list. Lastly, target genes that were found to have a highdifferential expression in cell line experiments with known highPI3K/low FOXO activity versus known low PI3K/high FOXO activity wereselected. Herein, target genes that had a minimum expression differenceof 2^(0.5) (herein: on a probeset level) between the “on” and “off”state of FOXO transcription averaged over multiple samples were includedin the third list. The third criterion was especially aimed at selectingthe most discriminative target genes. Based on the expression levels incell line experiments with multiple samples with known high PI3K/lowFOXO activity and multiple samples with known low PI3K/high FOXOactivity, an odds ratio (OR) was calculated. Herein, the odds ratio wascalculated per probeset using the median value as a cut-off and a softboundary representing uncertainty in the measurement. Target genes fromthe evidence curated list and the database-based list were rankedaccording to the “soft” odds ratio and the highest ranked (OR>2) andlowest ranked (OR<½, i.e., negatively regulated target genes) targetgenes were selected for the third list of target genes.

Taking into account the function of the gene, the differentialexpression in “on” versus “off” signaling and a higher odds ratio, a setof target genes was found (shown in Table 3) that was considered to bemore probative in determining the activity of the PI3K signalingpathway. Such a list of target genes may be called a “shortlist oftarget genes”. Hence, the target genes reported in Table 3 areparticularly preferred according to the present invention. Nonetheless,given the relative ease with which acquisition technology such asmicroarrays can acquire expression levels for large sets of genes, it iscontemplated to utilize some or all of the target genes of Table 3, andoptionally additionally use on, two, some, or all of the remainingtarget genes of Table 1 and Table 2.

TABLE 3 Shortlist of target genes of the PI3K cellular signaling pathwaybased on the evidence curated list of target genes and thedatabase-based list of target genes. Target gene AGRP BCL2L11 BCL6 BNIP3BTG1 CAT CAV1 CCND1 CCND2 CCNG2 CDKN1A CDKN1B ESR1 FASLG FBXO32 GADD45AINSR MXI1 NOS3 PCK1 POMC PPARGC1A PRDX3 RBL2 SOD2 TNFSF10

Example 3: Training and Using the Mathematical Model

Before the mathematical model can be used to infer the activity of thecellular signaling pathway, herein, the PI3K cellular signaling pathway,in a tissue and/or cells and/or a body fluid of a medical subject, themodel must be appropriately trained.

If the mathematical model is a probabilistic model, e.g., a Bayesiannetwork model, based at least in part on conditional probabilitiesrelating the FOXO TF element and expression levels of the one or moretarget gene(s) of the PI3K cellular signaling pathway measured in theextracted sample of the tissue and/or the cells and/or the body fluid ofthe medical subject, the training may preferably be performed asdescribed in detail in the published international patent application WO2013/011479 A2 (“Assessment of cellular signaling pathway activity usingprobabilistic modeling of target gene expression”).

If the mathematical model is based at least in part on one or morelinear combination(s) of expression levels of the one or more targetgene(s) of the PI3K cellular signaling pathway measured in the extractedsample of the tissue and/or the cells and/or the body fluid of themedical subject, the training may preferably be performed as describedin detail in the published international patent application WO2014/102668 A2 (“Assessment of cellular signaling pathway activity usinglinear combination(s) of target gene expressions”).

a) Exemplary Bayesian Network Model

Herein, an exemplary Bayesian network model as shown in FIG. 1 was firstused to model the transcriptional program of the PI3K cellular signalingpathway in a simple manner. The model consists of three types of nodes:(a) a transcription factor (TF) element in a first layer 1; (b) targetgene(s) TG1, TG2, TGn in a second layer 2, and, in a third layer 3; (c)measurement nodes linked to the expression levels of the target gene(s).These can be microarray probesets PS1a, PS1b, PS1c, PS2a, PSna, PSnb, aspreferably used herein, but could also be other gene expressionmeasurements such as RNAseq or RT-qPCR.

A suitable implementation of the mathematical model, herein, theexemplary Bayesian network model, is based on microarray data. The modeldescribes (i) how the expression levels of the target gene(s) depend onthe activation of the TF element, and (ii) how probeset intensities, inturn, depend on the expression levels of the respective target gene(s).For the latter, probeset intensities may be taken from fRMApre-processed Affymetrix HG-U133Plus2.0 microarrays, which are widelyavailable from the Gene Expression Omnibus (GEO,www.ncbi.nlm.nih.gov/geo) and ArrayExpress (www.ebi.ac.uk/arrayexpress).

As the exemplary Bayesian network model is a simplification of thebiology of a cellular signaling pathway, herein, the PI3K cellularsignaling pathway, and as biological measurements are typically noisy, aprobabilistic approach was opted for, i.e., the relationships between(i) the TF element and the target gene(s), and (ii) the target gene(s)and their respective probesets, are described in probabilistic terms.Furthermore, it was assumed that the activity of the oncogenic cellularsignaling pathway which drives tumor growth is not transiently anddynamically altered, but long term or even irreversibly altered.Therefore the exemplary Bayesian network model was developed forinterpretation of a static cellular condition. For this reason complexdynamic cellular signaling pathway features were not incorporated intothe model.

Once the exemplary Bayesian network model is built and calibrated (seebelow), the model can be used on microarray data of a new sample byentering the probeset measurements as observations in the third layer 3,and inferring backwards in the model what the probability must have beenfor the TF element to be “present”. Here, “present” is considered to bethe phenomenon that the TF element is bound to the DNA and iscontrolling transcription of the cellular signaling pathway's targetgenes, and “absent” the case that the TF element is not controllingtranscription. This latter probability is hence the primary read-outthat may be used to indicate activity of the cellular signaling pathway,herein, the PI3K cellular signaling pathway, which can next betranslated into the odds of the cellular signaling pathway being activeby taking the ratio of the probability of being active vs. beinginactive (i.e., the odds are given by p/(1−p) if p is the predictedprobability of the cellular signaling pathway being active).

In the exemplary Bayesian network model, the probabilistic relationshave been made quantitative to allow for a quantitative probabilisticreasoning. In order to improve the generalization behavior across tissuetypes, the parameters describing the probabilistic relationships between(i) the TF element and the target gene(s) have been carefullyhand-picked. If the TF element is “absent”, it is most likely that thetarget gene is “down”, hence a probability of 0.95 is chosen for this,and a probability of 0.05 for the target gene being “up”. The latter(non-zero) probability is to account for the (rare) possibility that thetarget gene is regulated by other factors or accidentally observed “up”(e.g. because of measurement noise). If the TF element is “present”,then with a probability of 0.70 the target gene is considered “up”, andwith a probability of 0.30 the target gene is considered “down”. Thelatter values are chosen this way, because there can be several reasonswhy a target gene is not highly expressed even though the TF element ispresent, for instance, because the gene's promoter region is methylated.In the case that a target gene is not up-regulated by the TF element,but down-regulated, the probabilities are chosen in a similar way, butreflecting the down-regulation upon presence of the TF element. Theparameters describing the relationships between (ii) the target gene(s)and their respective probesets have been calibrated on experimentaldata. For the latter, in this example, microarray data was used fromcell line experiments with defined active and inactive pathway settings,but this could also be performed using patient samples with knowncellular signaling pathway activity status.

Herein, publically available data on the expression of a HUVEC cell linewith a stable transfection of a FOXO construct that is inducible uponstimulation with 4OHT (GSE16573 available from the Gene ExpressionOmnibus) was used as an example. The cell lines with the inducible FOXOconstruct that were stimulated for 12 hours with 4OHT were considered asthe FOXO active samples (n=3), whereas the passive FOXO samples were thecell lines with the construct without 4OHT stimulation (n=3).

FIG. 2 shows training results of the exemplary Bayesian network modelbased on (A.) the evidence curated list of target genes of the PI3Kcellular signaling pathway (cf. Table 1), (B.) the database-based listof target genes of the PI3K cellular signaling pathway (cf. Table 2),and (C.) the shortlist of target genes of the PI3K cellular signalingpathway (cf. Table 3). In the diagram, the vertical axis indicates theodds that the FOXO TF element is “present” resp. “absent”, whichcorresponds to the PI3K cellular signaling pathway being inactive resp.active, wherein values above the horizontal axis correspond to the FOXOTF element being more likely “present”/active and values below thehorizontal axis indicate that the odds that the FOXO TF element is“absent”/inactive are larger than the odds that it is “present”/active.

The third group 3 of three samples encompassing the cell lines that werenot stimulated with tamoxifen and that are thus FOXO inactive wasassigned a passive FOXO label, whereas the fourth group 4 encompassingthe samples stimulated with 4OHT, which are thus FOXO active, wasassigned an active label. In the same dataset, the first, second andfifth group 1, 2, 5 were correctly predicted to have a passive PI3Kcellular signaling pathway. The last group 6 consists of cell linestransfected with a mutation variant of the FOXO that is expected to beinsensitive towards 4OHT stimulation. Nevertheless, some activity wasfound in the second model (B.) and in the third model (C.). The modelbased on the evidence curated list of target genes of the PI3K cellularsignaling pathway correctly predicts the PI3K cellular signaling pathwayto be passive in the last group 6, whereas the other two lists predictedit to be active with a relative low probability. (Legend: 1—PrimaryHUVECs infected with empty vector; 2—Primary HUVECs with empty vector+12h stimulation with OHT; 3—Primary HUVECs infected with FOXO.A3.ERvector; 4—Primary HUVECs with FOXO.A3.ER vector+12 h stimulation withOHT; 5—Primary HUVECs infected with FOXO.A3.ER. H212R vector, 6—PrimaryHUVECs with FOXO.A3.ER.H212R vector+12 h stimulation with OHT)

In the following, test results of the exemplary Bayesian network modelare shown in FIGS. 3 to 6.

FIG. 3 show test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for breast (cancer) samples of GSE17907. In the diagram,the vertical axis indicates the odds that the FOXO TF element is“present” resp. “absent”, which corresponds to the PI3K cellularsignaling pathway being inactive resp. active, wherein values above thehorizontal axis correspond to the FOXO TF element being more likely“present”/active and values below the horizontal axis indicate that theodds that the FOXO TF element is “absent”/inactive are larger than theodds that it is “present”/active. The model correctly predicts an activeFOXO TF element in the normal breast samples (group 5) as it is knownfrom the literature. The majority of the samples predicted to have apassive FOXO TF element are found in the ERBB2/HER2 subgroup (group 3),which is not unexpectedly as an over-amplification of the ERBB2 gene,which encodes for HER2, is scientifically linked to an activity of thePI3K cellular signaling pathway and, consequently, in the translocationof FOXO out of the nucleus resulting in inhibition of FOXO-regulatedtranscription. The breast cancer sample with the molecular subtype basal(group 2) is, as expected, predicted to have an inactive FOXO TFelement, since it is known that basal breast cancers typically lack HER2expression and are therefore not likely to have an active PI3K cellularsignaling pathway. (Legend: 1—Unknown, 2—Basal, 3—ERBB2/HER2, 4—LuminalA, 5—Normal breast, 6—Normal like).

FIG. 4 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for a number of healthy colon samples (group 1) andadenomatous polyps (group 2) published as the GSE8671 dataset. In thediagram, the vertical axis indicates the odds that the FOXO TF elementis “present” resp. “absent”, which corresponds to the PI3K cellularsignaling pathway being inactive resp. active, wherein values above thehorizontal axis correspond to the FOXO TF element being more likely“present”/active and values below the horizontal axis indicate that theodds that the FOXO TF element is “absent”/inactive are larger than theodds that it is “present”/active. The model correctly predicts an activePI3K cellular signaling pathway in the normal samples (group 1), wherethe PI3K cellular signaling pathway is expected to be working normally.With respect to the adenomatous polyps (group 2), it is known from theliterature that they express an increased activity of the PI3K cellularsignaling pathway as a result of mutation therein. Philips andcolleagues have shown that up to 86% of the colorectal tumors in theirstudy had an increased activity of the PI3K cellular signaling pathway(Wayne A. Philips, et al., “Increased levels of phosphatidylinositol3-kinase activity in colorectal tumors”, Cancer, Vol. 83, No. 1, July1998, pages 41 to 47). All but three of the adenoma samples werepredicted by the model as being FOXO passive, and, hence, PI3K active,which nicely correlates with the number found in the literature.(Legend: 1—Normal, 2—Adenoma).

FIG. 5 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for colon (cancer) samples of GSE20916. In the diagram,the vertical axis indicates the odds that the FOXO TF element is“present” resp. “absent”, which corresponds to the PI3K cellularsignaling pathway being inactive resp. active, wherein values above thehorizontal axis correspond to the FOXO TF element being more likely“present”/active and values below the horizontal axis indicate that theodds that the FOXO TF element is “absent”/inactive are larger than theodds that it is “present”/active. The model, again, correctly predictsthe normal samples to have an active FOXO TF element (groups 1 and 3),with the exception of the micro-dissected samples of the cryptepithelial cells (group 2), which likely have an active PI3K cellularsignaling pathway and a passive FOXO TF element as a result of theircontinuous proliferation and more stem cell-like behaviour (PatrickLaprise, et al., “Phosphatidylinositol 3-kinase controls humanintestinal epithelial cell differentiation by promoting adherensjunction assembly and p38 MAPK activation”, Journal of BiologicalChemistry, Vol. 277, No. 10, March 2002, pages 8226 to 8234).Unsurprisingly other FOXO passive samples are found in cancerous tissue(adenomas and carcinomas; groups 8 to 11). (Legend: 1—Normal colon(mucosa), 2—Normal colon (crypt), 3—Normal colon (surgery), 4—Distantnormal colon (mucosa), 5—Distant normal colon (crypt), 6—Adenoma(mucosa), 7—Adenoma (crypt), 8—Adenocarcinoma (surgery), 9—Carcinoma(mucosa), 10—Carcinoma (crypt), 11—Carcinoma (surgery))

FIG. 6 shows test results of the exemplary Bayesian network model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for prostate (cancer) cells published in the GSE17951dataset. In the diagram, the vertical axis indicates the odds that theFOXO TF element is “present” resp. “absent”, which corresponds to thePI3K cellular signaling pathway being inactive resp. active, whereinvalues above the horizontal axis correspond to the FOXO TF element beingmore likely “present”/active and values below the horizontal axisindicate that the odds that the FOXO TF element is “absent”/inactive arelarger than the odds that it is “present”/active. All normal cells ofthe control group (group 2) are predicted to have an active FOXO TFelement, whereas a small fraction of the samples in the tumour group(group 3) and the biopsy group (group 1) are predicted to have FOXOtranscription silenced. In the literature, activity of the PI3K cellularsignaling pathway in prostate cancer is reported (e.g., Mari Kaarbø, etal., “PI3K-AKT-mTOR pathway is dominant over androgen receptor signalingin prostate cancer cells”, Cellular Oncology, Vol. 32, No. 1-2, 2010,pages 11 to 27). (Legend: 1—Biopsy, 2—Control, 3—Tumor)

FIG. 7 illustrates a prognosis of ER+ breast cancer patients (GSE6532 &GSE9195) depicted in a Kaplan-Meier plot. In the diagram, the verticalaxis indicates the recurrence free survival as a fraction of the patientgroup and the horizontal axis indicates a time in years. The plotindicates that an active FOXO TF element (indicated by the less steepslope of the curve that the curve ending above the other curve on theright side of the plot), which correlates with a passive PI3K cellularsignaling pathway, is protective for recurrence, whereas having apassive FOXO TF element and, thus, an abnormally active PI3K cellularsignaling pathway, is associated with a high risk of recurrence. (Thepatient group with a predicted active FOXO TF element consisted of 114patients, whereas the patient group with a predicted passive FOXO TFelement consisted of 50 patients). This result is also demonstrated inthe hazard ratio of the predicted probability of FOXO transcriptionactivity (using the probability of FOXO activity based on the shortlistof target genes of the PI3K cellular signaling pathway (cf. Table 3) aspredictor): 0.45 (95% CI: 0.20-1.0, p<0.03).

b) Exemplary (Pseudo-)Linear Model

Before the (pseudo-)linear models as exemplary described herein can beused to infer pathway activity in a test sample the weights indicatingthe sign and magnitude of the association between the nodes and athreshold to call whether a node is either “absent” or present” need tobe determined. One can use expert knowledge to fill in the weights andthreshold a priori, but typically models are trained using arepresentative set of training samples, of which preferably the groundtruth is known. E.g. expression data of probesets in samples with aknown present transcription factor complex (=active pathway) or absenttranscription factor complex (=passive pathway). However, it isimpractical to obtain training samples from many different kinds ofcancers, of which it is known what the activation status of the pathwayto be modeled is. As a result, available training sets consist of alimited number of samples, typically from one type of cancer only.Herein a method is described to determine the parameters necessary toclassify test samples as having an active or passive pathway.

Known in the field are a multitude of training algorithms (e.g.regression) that take into account the model topology and changes themodel parameters, here weight and threshold, such that the model output,here weighted linear score, is optimized. Herein we demonstrate twoexemplary methods that can be used to calculate the weights directlyfrom the expression levels without the need of an optimizationalgorithm.

The first method, defined here as “black and white”-method boils down toa ternary system with the weighting factors being an element of {−1, 0,1}. If we would put this in the biological context, the −1 and 1corresponds to genes or probes that are down- and upregulated in case ofPI3K cellular signaling pathway activity, respectively. In case a probeor gene cannot be statistically proven to be either up- ordownregulated, it receives a weight of 0. Here one can use a left-sidedand right-sided, two sample t-test of the expression levels of theactive PI3K cellular signaling pathway samples versus the expressionlevels of the samples with a passive PI3K cellular signaling pathway todetermine whether a probe or gene is up- or downregulated given the usedtraining data. In cases where the average of the active samples isstatistically larger than the passive samples, i.e. the p-value is belowa certain threshold, e.g. 0.3, the probeset or target gene is determinedto be upregulated. Conversely, in cases where the average of the activesamples is statistically lower than the passive samples this probeset ortarget gene is determined to be downregulated upon activation of thePI3K cellular signaling pathway. In case the lowest p-value (left- orright-sided) exceeds the aforementioned threshold, the weight of thisprobe or gene can be defined to be 0.

An alternative method to come to weights and threshold(s) is based onthe logarithm (e.g. base e) of the odds ratio, and therefore called “logodds”-weights. The odds ratio for each probe or gene is calculated basedon the number of positive and negative training samples for which theprobe/gene level is above and below a corresponding threshold, e.g. themedian of all training samples (equation 3 in WO 2014/102668 A2). Apseudo-count can be added to circumvent divisions by zero (equation 4 inWO 2014/102668 A2). A further refinement is to count the samplesabove/below the threshold in a somewhat more probabilistic manner, byassuming that the probe/gene levels are e.g. normally distributed aroundits observed value with a certain specified standard deviation (e.g.0.25 on a 2-log scale), and counting the probability mass above andbelow the threshold (equation 5 in WO 2014/102668 A2).

Alternatively, one can employ optimization algorithms known in the fieldsuch as regression to determine the weights and the threshold(s) of the(pseudo-) linear models described herein.

One has to take special attention to the way the parameters aredetermined for the (pseudo-)linear models to generalize well.Alternatively, one can use other machine learning methods such asBayesian networks that are known in the field to be able to generalizequite well by taking special measures during training procedures.

With reference to FIG. 8, an exemplary “two-layer” (pseudo-)linear modelof the PI3K cellular signaling pathway using all target genes from theshortlist of target genes of the PI3K cellular signaling pathway (cf.Table 3) and all probesets of these target genes on the first and secondlayer, respectively, was trained using continuous data on the expressionof a HUVEC cell line with a stable transfection of a FOXO construct thatis inducible upon stimulation with 4OHT (GSE16573 available from theGene Expression Omnibus) (cf. also the above description for theexemplary Bayesian network model). The cell lines with the inducibleFOXO construct that were stimulated for 12 hours with 4OHT wereconsidered as the FOXO active samples (n=3), whereas the passive FOXOsamples were the cell lines with the construct without 4OHT stimulation(n=3). The training encompassed calculating the weights of theconnections between the target genes expression levels, here representedby means of probeset intensities, and the target genes nodes using the“log odds”-method with a pseudocount of 10, as described herein.Subsequently, the activity score of the FOXO TF element was calculatedby summation of the calculated target genes expression scores multipliedby either 1 or −1 for upregulated or downregulated target genes,respectively.

In the diagram shown in FIG. 8, the vertical axis shows the weightedlinear score, wherein a positive resp. negative score indicates that theFOXO TF element is “present” resp. “absent”, which corresponds to thePI3K cellular signaling pathway being inactive resp. active. The thirdgroup 3 of three samples encompassing the cell lines that were notstimulated with tamoxifen and that are thus FOXO inactive was assigned apassive FOXO label, whereas the fourth group 4 encompassing the samplesstimulated with 4OHT, which are thus FOXO active, was assigned an activelabel. In the same dataset, the first, second and fifth group 1, 2, 5were correctly predicted to have a passive PI3K cellular signalingpathway. The last group 6 consists of cell lines transfected with amutation variant of the FOXO that is expected to be insensitive towards4OHT stimulation. Nevertheless, some activity was also found in thesixth group using the trained (pseudo-)linear model. (Legend: 1—PrimaryHUVECs infected with empty vector, 2—Primary HUVECs with empty vector+12h stimulation with OHT, 3—Primary HUVECs infected with FOXO.A3.ERvector, 4—Primary HUVECs with FOXO.A3.ER vector+12 h stimulation withOHT, 5—Primary HUVECs infected with FOXO.A3.ER. H212R vector, 6—PrimaryHUVECs with FOXO.A3.ER.H212R vector+12 h stimulation with OHT)

In the following, test results of the exemplary (pseudo-)linear modelare shown in FIGS. 9 and 10.

FIG. 9 shows test results of the exemplary (pseudo-)linear model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for breast (cancer) samples of GSE17907. In the diagram,the vertical axis indicates the score that the FOXO TF element is“present” resp. “absent”, which corresponds to the PI3K cellularsignaling pathway being inactive resp. active, wherein values above thehorizontal axis correspond to the FOXO TF element being more likely“present”/active and values below the horizontal axis indicate that theodds that the FOXO TF element is “absent”/inactive are larger than theodds that it is “present”/active. The model correctly predicts an activeFOXO TF element in the normal breast samples (group 5), as it is knownfrom the literature. The majority of the samples predicted to have apassive FOXO TF element are found in the ERBB2/HER2 group (group 3),which is not unexpectedly, as an over-amplification of the ERBB2 gene,which encodes for HER2, is scientifically linked to an activity of thePI3K cellular signaling pathway and, consequently, in the translocationof FOXO out of the nucleus resulting in inhibition of FOXO-regulatedtranscription. (Legend: 1—Unknown, 2—Basal, 3—ERBB2/HER2, 4—Luminal A,5—Normal breast, 6—Normal like)

FIG. 10 shows test results of the exemplary (pseudo-)linear model basedon the shortlist of target genes of the PI3K cellular signaling pathway(cf. Table 3) for prostate (cancer) samples of GSE17951. In the diagram,the vertical axis indicates the score that the FOXO TF element is“present” resp. “absent”, which corresponds to the PI3K cellularsignaling pathway being inactive resp. active, wherein values above thehorizontal axis correspond to the FOXO TF element being more likely“present”/active and values below the horizontal axis indicate that theodds that the FOXO TF element is “absent”/inactive are larger than theodds that it is “present”/active. All normal cells of the control group(group 2) are predicted to have an active FOXO TF element, whereas asmall fraction of the samples in the tumor group (group 3) and a largerfraction in the biopsy group (group 1) are predicted to have FOXOtranscription silenced, corresponding to an increased activity of thePI3K cellular signaling pathway. In the literature, activity of the PI3Kcellular signaling pathway in prostate cancer is reported (e.g., MariKaarbø, et al., “PI3K-AKT-mTOR pathway is dominant over androgenreceptor signaling in prostate cancer cells”, Cellular Oncology, Vol.32, No. 1-2, 2010, pages 11 to 27) which is confirmed in these results.(Legend: 1—Biopsy, 2—Control, 3—Tumor)

Instead of applying the mathematical model, e.g., the exemplary Bayesiannetwork model or the (pseudo-)linear model, on mRNA input data comingfrom microarrays or RNA sequencing, it may be beneficial in clinicalapplications to develop dedicated assays to perform the samplemeasurements, for instance on an integrated platform using qPCR todetermine mRNA levels of target genes. The RNA/DNA sequences of thedisclosed target genes can then be used to determine which primers andprobes to select on such a platform.

Validation of such a dedicated assay can be done by using themicroarray-based mathematical model as a reference model, and verifyingwhether the developed assay gives similar results on a set of validationsamples. Next to a dedicated assay, this can also be done to build andcalibrate similar mathematical models using mRNA-sequencing data asinput measurements.

The set of target genes which are found to best indicate specificpathway activity, based on microarray/RNA sequencing based investigationusing the mathematical model, e.g., the exemplary Bayesian network modelor the (pseudo-)linear model, can be translated into a multiplexquantitative PCR assay to be performed on an extracted sample of thetissue and/or the cells and/or the body fluid of the medical subjectand/or a computer to interpret the expression measurements and/or toinfer the activity of the PI3K cellular signaling pathway. To developsuch a test (e.g., FDA-approved or a CLIA waived test in a centralservice lab) for cellular signaling pathway activity, development of astandardized test kit is required, which needs to be clinicallyvalidated in clinical trials to obtain regulatory approval.

The present invention relates to a method comprising inferring activityof a PI3K cellular signaling pathway in a tissue and/or cells and/or abody fluid of a medical subject based at least on expression levels ofone or more target gene(s) of the PI3K cellular signaling pathwaymeasured in an extracted sample of the tissue and/or the cells and/orthe body fluid of the medical subject. The present invention furtherrelates to an apparatus comprising a digital processor configured toperform such a method, a non-transitory storage medium storinginstructions that are executable by a digital processing device toperform such a method, and a computer program comprising program codemeans for causing a digital processing device to perform such a method.

The method may be used, for instance, in diagnosing an (abnormal)activity of the PI3K cellular signaling pathway, in prognosis based onthe inferred activity of the PI3K cellular signaling pathway, in theenrollment of a medical subject in a clinical trial based on theinferred activity of the PI3K cellular signaling pathway, in theselection of subsequent test(s) to be performed, in the selection ofcompanion diagnostics tests, in clinical decision support systems, orthe like. In this regard, reference is made to the publishedinternational patent application WO 2013/011479 A2 (“Assessment ofcellular signaling pathway activity using probabilistic modeling oftarget gene expression”) and to the published international patentapplication WO 2014/102668 A2 (“Assessment of cellular signaling pathwayactivity using linear combination(s) of target gene expressions”), whichdescribe these applications in more detail.

SEQUENCE LISTING: Seq. No.: Gene: Seq. 1 AGRP Seq. 2 ATG14 Seq. 3 ATP8A1Seq. 4 BCL2L11 Seq. 5 BCL6 Seq. 6 BIRC5 Seq. 7 BNIP3 Seq. 8 BTG1 Seq. 9C10orf10 Seq. 10 CAT Seq. 11 CAV1 Seq. 12 CBLB Seq. 13 CCND1 Seq. 14CCND2 Seq. 15 CCNG2 Seq. 16 CDKN1A Seq. 17 CDKN1B Seq. 18 DDB1 Seq. 19DYRK2 Seq. 20 ERBB3 Seq. 21 EREG Seq. 22 ESR1 Seq. 23 EXT1 Seq. 24 FASLGSeq. 25 FBXO32 Seq. 26 FGFR2 Seq. 27 GADD45A Seq. 28 IGF1R Seq. 29IGFBP1 Seq. 30 IGFBP3 Seq. 31 INSR Seq. 32 KLF2 Seq. 33 KLF4 Seq. 34LGMN Seq. 35 MXI1 Seq. 36 MYOD1 Seq. 37 NOS3 Seq. 38 PCK1 Seq. 39 PDK4Seq. 40 POMC Seq. 41 PPARGC1A Seq. 42 PPM1D Seq. 43 PRDX3 Seq. 44 RAG1Seq. 45 RAG2 Seq. 46 RBL2 Seq. 47 SEMA3C Seq. 48 SEPP1 Seq. 49 SESN1Seq. 50 SIRT1 Seq. 51 SLC5A3 Seq. 52 SMAD4 Seq. 53 SOD2 Seq. 54 STK11Seq. 55 TLE4 Seq. 56 TNFSF10 Seq. 57 TXNIP

1. A kit for determining abnormal activity of a PI3K cellular signalingpathway in a subject, comprising: one or more components for determiningexpression levels of three or more target genes of the PI3K cellularsignaling pathway in a sample of the subject, wherein the one or morecomponents are primers and probes for detecting the expression levels ofthe three or more target genes, and wherein the three or more targetgenes are selected from the group consisting of: AGRP, BCL2L11, BCL6,BNIP3, BTG1, CAT, CAV1, CCND1, CCND2, CCNG2, CDKN1A, CDKN1B, ESR1,FASLG, FBXO32, GADD45A, INSR, MXI1, NOS3, PCK1, POMC, PPARGC1A, PRDX3,RBL2, SOD2 and TNFSF10; and a digital processing device configured todetermine the abnormal activity of the PI3K cellular signaling pathwayin the subject, wherein the inferring comprises: determining a level ofa FOXO transcription factor (TF) element in the sample, the FOXO TFelement controlling transcription of the three or more target genes ofthe PI3K cellular signaling pathway, the determining being based atleast in part on evaluating a mathematical model relating expressionlevels of the three or more target genes of the PI3K cellular signalingpathway to the level of the FOXO TF element; inferring the activity ofthe PI3K cellular signaling pathway subject based on the determinedlevel of the FOXO TF element in the sample; and determining that thePI3K cellular signaling pathway is operating abnormally in the subjectbased on the inferred activity of the PI3K cellular signaling pathway inthe sample.
 2. The kit according to claim 1, further comprising primersand probes for detecting the expression levels of an additional targetgene selected from the group consisting of: ATP8A1, C10orf10, CBLB,DDB1, DYRK2, ERBB3, EREG, EXT1, FGFR2, IGF1R, IGFBP1, IGFBP3, LGMN,PPM1D, SEMA3C, SEPP1, SESN1, SLC5A3, SMAD4 and TLE4.
 3. The kitaccording to claim 1, further comprising primers and probes fordetecting the expression levels of an additional target gene selectedfrom the group consisting of: ATG14, BIRC5, IGFBP1, KLF2, KLF4, MYOD1,PDK4, RAG1, RAG2, SESN1, SIRT1, STK11 and TXNIP.
 4. The kit according toclaim 1, wherein the kit comprises primers and probes for determiningthe expression levels of at least nine target genes selected from thegroup consisting of: AGRP, BCL2L11, BCL6, BNIP3, BTG1, CAT, CAV1, CCND1,CCND2, CCNG2, CDKN1A, CDKN1B, ESR1, FASLG, FBXO32, GADD45A, INSR, MXI1,NOS3, PCK1, POMC, PPARGC1A, PRDX3, RBL2, SOD2 and TNFSF10.
 5. The kitaccording to claim 4, further comprising primers and probes fordetecting the expression levels of an additional target gene selectedfrom the group consisting of: ATP8A1, C10orf10, CBLB, DDB1, DYRK2,ERBB3, EREG, EXT1, FGFR2, IGF1R, IGFBP1, IGFBP3, LGMN, PPM1D, SEMA3C,SEPP1, SESN1, SLC5A3, SMAD4 and TLE4.
 6. The kit according to claim 4,further comprising primers and probes for detecting the expressionlevels of an additional target gene selected from the group consistingof: ATG14, BIRC5, IGFBP1, KLF2, KLF4, MYOD1, PDK4, RAG1, RAG2, SESN1,SIRT1, STK11 and TXNIP.
 7. The kit according to claim 1, wherein the kitcomprises primers and probes for determining the expression levels ofall target genes selected from the group consisting of: AGRP, BCL2L11,BCL6, BNIP3, BTG1, CAT, CAV1, CCND1, CCND2, CCNG2, CDKN1A, CDKN1B, ESR1,FASLG, FBXO32, GADD45A, INSR, MXI1, NOS3, PCK1, POMC, PPARGC1A, PRDX3,RBL2, SOD2 and TNFSF10.
 8. The kit according to claim 1, wherein the kitis for determining the abnormal activity of the PI3K cellular signalingpathway in an extracted sample of a tissue and/or cells and/or bodyfluid of the subject.
 9. The kit according to claim 1, whereindetermining further comprises: recommending prescribing a drug for thesubject that corrects for the determined abnormal operation of the PI3Kcellular signaling pathway.
 10. The kit according to claim 1, whereindetermining further comprises one or more of: diagnosis based on theinferred activity of the PI3K cellular signaling pathway in the tissueand/or the cells and/or the body fluid of the medical subject; prognosisbased on the inferred activity of the PI3K cellular signaling pathway inthe tissue and/or the cells and/or the body fluid of the medicalsubject; drug prescription based on the inferred activity of the PI3Kcellular signaling pathway in the tissue and/or the cells and/or thebody fluid of the medical subject; prediction of drug efficacy based onthe inferred activity of the PI3K cellular signaling pathway in thetissue and/or the cells and/or the body fluid of the medical subject;prediction of adverse effects based on the inferred activity of the PI3Kcellular signaling pathway in the tissue and/or the cells and/or thebody fluid of the medical subject; monitoring of drug efficacy; drugdevelopment; assay development; pathway research; cancer staging;enrollment of the medical subject in a clinical trial based on theinferred activity of the PI3K cellular signaling pathway in the tissueand/or the cells and/or the body fluid of the medical subject; selectionof subsequent test to be performed; and selection of companiondiagnostics tests.
 11. The kit according to claim 1, wherein thedetermined abnormal operation of the PI3K cellular signaling pathway isoveractive
 12. The kit according to claim 11, wherein determiningfurther comprises recommending prescribing a specific treatment for thesubject that inhibits or deregulates the activity of the PI3K cellularsignaling pathway.
 13. The kit according to claim 11, whereindetermining further comprises providing, based on the determinedabnormal operation of the PI3K cellular signaling pathway, arecommendation for medical intervention configured to inhibit orderegulate the activity of the PI3K cellular signaling pathway.
 14. Thekit according to claim 13, wherein the recommendation for medicalintervention configured to inhibit or deregulate the activity of thePI3K cellular signaling pathway comprises recommending or selecting adrug that inhibits or downregulates the activity of the PI3K cellularsignaling pathway.
 15. The kit according to claim 1, further comprisinga non-transitory storage medium storing instructions that are executableby the digital processing device to determine the abnormal activity ofthe PI3K cellular signaling pathway in the subject.
 16. The kitaccording to claim 1, further comprising a computer program comprisingprogram code means for causing the digital processing device todetermine the abnormal activity of the PI3K cellular signaling pathwayin the subject.